Page content rank: an approach to the web content mining

نویسندگان

  • Jaroslav Pokorný
  • Jozef Smizanský
چکیده

Methods of web data mining can be divided into several categories according to a kind of mined information and goals that particular categories set: Web structure mining (WSM), Web usage mining (WUM), and Web Content Mining (WCM). The objective of this paper is to propose a new WCM method of a page relevance ranking based on the page content exploration. The method, we call it Page Content Rank (PCR) in the paper, combines a number of heuristics that seem to be important for analysing the content of Web pages. The page importance is determined on the base of the importance of terms which the page contains. The importance of a term is specified with respect to a given query q and it is based on its statistical and linguistic features. As a source set of pages for mining we use a set of pages responded by a search engine to the query q. PCR uses a neural network as its inner classification structure. We describe an implementation of the proposed method and a comparison of its results with the other existing classification system – PageRank algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

A Survey Paper of Structure Mining Technique using Clustering and Ranking Algorithm

A survey of various link analysis and clustering algorithms such as Page Rank, Hyperlink-Induced Topic Search, Weighted Page Rank based on Visit of Links K-Means, Fuzzy K-Means. Ranking algorithms illustrated, Weighted Page Rank is more efficient than Hyperlink-induced Topic Search Whereas clustering algorithms has described Fuzzy Soft, Rough K-Means is a mixture of Rough K-Means and fuzzy soft...

متن کامل

Ranking WebPages Using Web Structure Mining Concepts

With the rapid growth of the Web, users get easily lost in the rich hyper structure on the web. Providing relevant information to the users to supply to their needs is the primary goal of the owners of these websites. Web mining is one of the techniques that could help the websites owner in this direction. Web mining was categorized into three categories such as web content mining, web usage mi...

متن کامل

Effective Learning to Rank Persian Web Content

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...

متن کامل

Performance Enhancement in Collaborative Filtering Technique by Removing Shilling Effect

Web page content mining is traditional searching of Web pages with the help of content, while Search results mining is a further search of pages found from a previous search. Web content mining has an approach that is shilling effect in which rating can be done and gives improper results. In this work we proposed an algorithm which gives appropriate values than shilling effects. Keywords, web m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005